Optimal and Approximate Q-value Functions for Decentralized POMDPs

机译：分散pOmDp的最优和近似Q值函数

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Decision-theoretic planning is a popular approach to sequential decisionmaking problems, because it treats uncertainty in sensing and acting in aprincipled way. In single-agent frameworks like MDPs and POMDPs, planning canbe carried out by resorting to Q-value functions: an optimal Q-value functionQ* is computed in a recursive manner by dynamic programming, and then anoptimal policy is extracted from Q*. In this paper we study whether similarQ-value functions can be defined for decentralized POMDP models (Dec-POMDPs),and how policies can be extracted from such value functions. We define twoforms of the optimal Q-value function for Dec-POMDPs: one that gives anormative description as the Q-value function of an optimal pure joint policyand another one that is sequentially rational and thus gives a recipe forcomputation. This computation, however, is infeasible for all but the smallestproblems. Therefore, we analyze various approximate Q-value functions thatallow for efficient computation. We describe how they relate, and we prove thatthey all provide an upper bound to the optimal Q-value function Q*. Finally,unifying some previous approaches for solving Dec-POMDPs, we describe a familyof algorithms for extracting policies from such Q-value functions, and performan experimental evaluation on existing test problems, including a newfirefighting benchmark problem.

机译：决策理论规划是解决顺序决策问题的一种流行方法，因为它以原则的方式处理感知和行动中的不确定性。在MDP和POMDP之类的单代理程序框架中，可以通过使用Q值函数来进行计划：通过动态编程以递归方式计算最优Q值函数Q *，然后从Q *中提取最佳策略。在本文中，我们研究了是否可以为分散的POMDP模型（Dec-POMDPs）定义类似的Q值函数，以及如何从这些值函数中提取策略。我们为Dec-POMDP定义了最优Q值函数的两种形式：一种给出了对最优纯联合策略的Q值函数的描述，而另一种则是顺序合理的，从而给出了计算公式。但是，除了最小的问题外，这种计算是不可行的。因此，我们分析了允许有效计算的各种近似Q值函数。我们描述它们之间的关系，并证明它们都为最优Q值函数Q *提供了上限。最后，结合以前解决Dec-POMDP的一些方法，我们描述了一系列从此类Q值函数中提取策略的算法，并对现有的测试问题（包括新的消防基准问题）进行了实验评估。

著录项

作者
Oliehoek, Frans A.; Spaan, Matthijs T. J.; Vlassis, Nikos;
展开▼
作者单位

展开▼
年度 2011
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Optimal and Approximate Q-value Functions for Decentralized POMDPs [J] . Oliehoek F. A., Spaan M. T. J., Vlassis N. The Journal of Artificial Intelligence Research . 2008,第4期

机译：分散POMDP的最佳和近似Q值函数
2. Optimal and Approximate Q-value Functions for Decentralized POMDPs [J] . F. A. Oliehoek, M. T. J. Spaan, N. Vlassis Journal of Automation, Mobile Robotics & Intelligent Systems . 2008,第1期

机译：分散POMDP的最佳和近似Q值函数
3. Optimal and Approximate Q-value Functions for Decentralized POMDPs [J] . Frans A. Oliehoek, Frans A. Oliehoek, Nikos Vlassis The Journal of Artificial Intelligence Research . 2008,第0期

机译：分散POMDP的最佳和近似Q值函数
4. Q-value functions for decentralized POMDPs [C] . Frans A. Oliehoek, Nikos Vlassis, PNikos Vlassis International joint conference on Autonomous agents and multiagent systems . 2007

机译：分散式POMDP的Q值函数
5. Approximate dynamic programming based solutions for fixed-final-time optimal control and optimal switching. [D] . Heydari, Ali. 2013

机译：基于近似动态编程的解决方案，用于固定最终时间的最佳控制和最佳切换。
6. Modeling and Planning with Macro-Actions in Decentralized POMDPs [O] . Christopher Amato, George Konidaris, Leslie P. Kaelbling, -1

机译：在分散的POMDP中使用宏动作进行建模和计划
7. Optimal and approximate Q-value functions for decentralized POMDPs [O] . Oliehoek, Frans A., Spaan, Matthijs T. J., Vlassis, Nikos 2008

机译：分散pOmDp的最佳和近似Q值函数
8. Q-Value Dependence of Inelastic Scattering and Multinucleon Transfer Reactions exp 27 Al + exp 16 O at 88 MeV. Optimum Q Values and Q-Value Dependence of Angular Distributions of Reaction Products [R] . Mikumo, T., Sasagase, M., Sato, M., 1979

机译：在88meV下，非弹性散射和多核转移反应的Q值依赖性为27 al + exp 16 O.反应产物角分布的最佳Q值和Q值依赖性

Optimal and Approximate Q-value Functions for Decentralized POMDPs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅